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1  Project  Overview 

Experimental  data  play  an  essential  role  in  developing  and  refining  models  of  physical  systems. 
Data  are  used  to  infer  model  parameters,  to  improve  the  accuracy  of  model-based  predictions, 
to  assess  the  validity  of  models,  and  to  improve  design  and  decision-making  under  uncertainty. 
Yet  experimental  observations  can  be  difficult,  time-consuming,  and  expensive  to  acquire.  In  this 
context,  maximizing  the  value  of  experimental  observations — designing  experiments  to  be  optimal 
by  some  appropriate  measure — is  a  critical  task.  Experimental  design  encompasses  questions  of 
where  and  when  to  measure,  which  variables  to  interrogate,  and  what  experimental  conditions 
to  employ.  While  theory  and  algorithms  for  optimal  design  have  been  developed  for  many  linear 
parameter  estimation  problems,  rigorous  and  computationally  tractable  methods  for  optimal  design 
with  nonlinear  simulation-based  models  have  been  sorely  lacking. 

This  project  addresses  open  challenges  in  optimal  experimental  design  (OED)  for  complex 
physical  systems,  taking  a  Bayesian  decision  theoretic  approach.  Our  focus  has  been  on  generi- 
cally  nonlinear  systems  and  information  theoretic  design  objectives,  for  which  existing  theory  and 
computational  tools  have  been  inadequate.  As  described  below,  our  goal  has  been  to  develop 
new  mathematical  formulations,  estimation  approaches,  and  approximation  strategies 
to  make  rigorous  OED  feasible  for  systems  accessible  only  through  computational  simulation.  Work 
in  this  project  had  two  major  thrusts: 

•  Innovations  in  batch  optimal  experimental  design,  where  all  experiments  are  planned  si¬ 
multaneously  before  they  are  implemented.  A  key  output  of  this  thrust  is  a  new  multiple 
importance  sampling  scheme  for  estimating  expected  information  gain  (EIG).  EIG  is  a  central 
measure  of  the  information  due  to  an  experiment,  and  our  new  estimator  achieves  multiple 
orders  of  magnitude  smaller  error  (bias  and  variance),  for  a  given  computational  effort,  than 
previous  schemes. 

Coupled  with  this  estimator  is  a  new  formulation  for  focused  experimental  design,  i.e.,  exper¬ 
imental  design  in  the  presence  of  nuisance  parameters.  Very  often  the  goal  of  an  experiment 
is  to  learn  about  a  particular  quantity  of  interest,  yet  other  aspects  of  the  system  remain 
uncertain.  Focused  design  maximizes  the  expected  information  gain  in  the  marginal  distri¬ 
bution  of  the  parameters  of  interest  without  ignoring  these  other  uncertainties;  it  can  lead  to 
very  different  design  configurations  than  previous  (unfocused)  schemes.  A  natural  extension 
of  focused  experimental  design  is  the  notion  of  experimental  designs  that  account  for  model 
error,  when  model  error  is  itself  captured  by  a  statisical  inadequacy  or  discrepancy  model. 

•  New  formulations  and  computational  methods  for  sequential  optimal  experimental  design. 
Typical  current  practice  for  designing  multiple  experiments  uses  suboptimal  approaches: 
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open-loop  design  that  chooses  all  experiments  simultaneously  with  no  feedback  of  information, 
or  greedy  design  that  optimally  selects  the  next  experiment  without  accounting  for  future  ob¬ 
servations  and  dynamics.  By  contrast,  sequential  optimal  experimental  design  (sOED)  is  free 
of  these  limitations. 

We  have  rigorously  formulated  sOED  as  a  dynamic  programming  (DP)  problem,  and  devel¬ 
oped  new  numerical  tools  to  enable  DP  in  the  context  of  nonlinear  models  with  continuous 
(and  often  unbounded)  parameter,  design,  and  observation  spaces.  Two  major  techniques  are 
employed  to  make  solution  of  the  DP  problem  computationally  feasible.  First,  the  optimal 
policy  is  sought  using  a  one-step  lookahead  representation  combined  with  approximate  value 
iteration.  This  approximate  dynamic  programming  method  couples  backward  induction  and 
regression  to  construct  value  function  approximations.  It  also  iteratively  generates  trajecto¬ 
ries  via  exploration  and  exploitation  to  further  improve  approximation  accuracy  in  frequently 
visited  regions  of  the  state  space.  Second,  transport  maps  are  used  to  represent  belief  states, 
which  reflect  the  intermediate  posteriors  within  the  sequential  design  process.  Transport 
maps  offer  a  finite-dimensional  representation  of  these  generally  non-Gaussian  random  vari¬ 
ables,  and  also  enable  fast  approximate  Bayesian  inference,  which  must  be  performed  millions 
of  times  under  nested  combinations  of  optimization  and  Monte  Carlo  sampling. 

Collectively,  this  work  has  advanced  the  state  of  the  art  in  optimal  experimental  design,  yielding 
new  computational  approaches  applicable  to  a  wide  range  of  Air  Force  relevant  problems,  ranging 
from  object  detection  and  inverse  scattering  to  UAV  path  planning.  The  technical  accomplishments 
are  detailed  below. 

2  Technical  Accomplishments 

2.1  Efficient  methods  for  focused  experimental  design 

We  examine  the  optimal  design  of  experiments  when  the  experimental  goal  is  the  inference  of  a 
subset  of  model  parameters.  In  many  scenarios,  models  have  physical  parameters  and  tuning  param¬ 
eters,  but  we  may  wish  to  prioritize  information  gain  in  the  physical  parameters  over  information 
gain  in  the  tuning  parameters. 

We  formulate  the  experimental  design  problem  in  a  decision-theoretic  framework  where  the  ob¬ 
jective  function  is  the  expected  information  gain  in  only  the  parameters  of  interest.  In  a  Bayesian 
setting,  the  information  gain  in  the  parameters  of  interest  is  represented  by  the  difference  in  infor¬ 
mation  carried  by  the  prior  and  posterior  distributions,  which  reflect  our  knowledge  of  the  model 
parameters  before  and  after  observing  data,  respectively.  Unlike  existing  formulations,  we  look  at 
the  information  gain  in  the  marginal  distributions  of  the  parameters  of  interest — where  the  influence 
of  other  so-called  nuisance  parameters  have  been  integrated  out — so  that  our  objective  function 
only  reflects  information  gain  in  the  parameters  of  interest.  This  allows  us  to  exploit  tradeoffs 
in  learning  between  subsets  of  model  parameters  which  may  be  overlooked  if  our  objective  were 
information  gain  in  all  of  the  model  parameters. 

In  practice,  most  experimental  design  problems  do  not  yield  themselves  to  a  fully  analytic 
treatment  of  the  expected  utility.  Existing  approaches  for  estimating  the  expected  information  gain 
suffer  from  significant  limitations  due  to  computational  expenses  where  a  two  order-of-magnitude 
gain  in  computational  efficiency  would  be  required  even  to  discriminate  among  the  enumerated 
designs.  To  this  end,  we  have  developed  an  efficient  layered,  incremental  multiple  importance 
sampling  scheme  for  estimating  the  expected  information  gain  that  has  the  requisite  orders-of- 
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magnitude  reduction  in  estimator  error  required  to  make  solving  the  exact  optimal  design  problem 
tractable. 

Instead  of  using  a  naive  Monte  Carlo  estimator  which  draws  samples  from  the  prior  distribu¬ 
tion  to  estimate  the  posterior  quantities  of  interest — which  can  be  extremely  inefficient  when  the 
prior  and  the  posterior  differ  significantly,  as  is  the  case  when  data  are  informative — our  approach 
incrementally  approximates  the  posterior  distribution  using  information  from  existing  Monte  Carlo 
samples  that  would  have  been  discarded  by  the  naive  estimator,  and  remains  asymptotically  unbi¬ 
ased  by  using  the  posterior  approximations  indirectly  as  biasing  distributions  for  unbiased  impor¬ 
tance  sampling  estimates.  With  this  approach,  we  not  only  observe  significant  pointwise  reduction 
in  bias  and  variance,  but  we  also  observe  a  reduction  in  the  sensitivity  of  the  estimator  bias  with 
respect  to  the  experimental  design,  which  is  especially  important  in  the  context  of  optimal  design 
where  correlations  between  the  bias  in  the  objective  function  and  the  design  variable  can  result  in 
suboptimal  results. 

We  used  our  approach  on  two  experimental  design  problems,  a  4-dimensional  linear  Gaussian 
problem  where  three  of  the  four  parameters  are  nuisance  parameters.  The  gain  matrix  has  entries 
that  are  functions  of  the  1-dimensional  design  parameter  that  were  chosen  to  create  a  clear  tradeoff 
between  designs.  In  Figure  1,  we  observe  the  orders-of-magnitude  decrease  in  the  estimator  error, 
and  how  using  the  new  approach  allows  us  to  find  the  correct  optimal  design  at  d  =  1.  The  second 
example  describes  a  Mossbauer  spectroscopy  experiment,  where  the  goal  is  to  choose  measurement 
locations  on  the  horizontal  axis  that  result  in  data  that  are  informative  for  inferring  the  parameters 
that  describe  an  absorption  peak.  In  Figure  2,  we  plot  the  posterior  densities  upon  observing 
simulated  data,  and  observe  that  the  variance  in  the  marginal  densities  for  the  parameters  of 
interest  are  smaller  when  we  explicity  target  those  parameters  in  our  focused  experimental  design 
framework. 
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-  Prior  -  IS  -  MIS  -  Fbsterior 
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Design  d 


Figure  1:  (Above)  Estimated  expected  in¬ 
formation  gain  for  a  4D  linear  Gaussian 
model.  Red  is  naive  approach,  and  green 
is  our  approach.  Purple  is  the  theoretical 
lower  bound.  Note  the  incorrect  location 
of  the  maximum  when  using  the  naive  ap¬ 
proach.  (Left)  Mean  square  error.  Note 
the  two  orders-of-magnitude  decrease  in 
MSE. 
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Figure  2:  Our  approach  successfully  captures  the  tradeoff  in  targeting  information  gain  in  different  model 
parameters. 
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2.2  Sequential  optimal  experimental  design 
2.2.1  Formulation 

Common  practice  for  designing  a  sequence  of  experiments  uses  suboptimal  approaches:  batch  design 
that  has  no  feedback,  or  greedy  (myopic)  design  that  optimally  selects  only  the  next  experiment 
without  accounting  for  future  effects  and  dynamics.  The  sequential  optimal  experimental  design 
(sOED)  has  the  advantages  of 

1.  making  use  of  newly  acquired  information  during  the  design  process  to  guide  designs  of 
subsequent  experiments  (i.e.,  feedback),  and 

2.  taking  into  account  of  all  future  effects  and  dynamics. 

We  now  seek  the  optimal  policy ,  which  consists  of  functions  that  decide  what  the  best  design  is, 
given  the  updated  current  situation  (state)  (see  Figure  [3]). 


(a)  Batch  (open-loop)  design  (b)  Sequential  (closed-loop)  design 


Figure  3:  Batch  design  exhibits  an  open-loop  behavior,  where  no  feedback  is  involved,  and  the  observations 
from  any  experiment  do  not  affect  the  design  of  other  experiments.  Sequential  design  exhibits  a  closed-loop 
behavior,  where  feedback  occurs,  and  the  data  from  one  experiment  are  used  to  guide  the  design  of  future 
experiments.  “System  dynamics”  is  the  process  that  updates  the  state  from  one  experiment  to  the  next. 


The  sOED  problem  for  N  experiments  involves  finding  the  optimal  policy  7 t*  =  ,  p*N_1}, 

that  maximizes  the  expected  utility  (reward): 


7T 


=  max  E 


VO,—,VN-l\n 


~N- 1 

9k  (xk,  l Ik,  9k(xk))  +  gN{xN ) 

_  k= 0 


(i) 


Here  dk  =  Pk{xk)  is  the  design  for  the  fcth  experiments,  yk  is  the  observations,  xk  is  the  state 
(composed  of  a  belief  state  component  xkjb  that  describes  uncertainty,  and  physical  state  component 
Xk,P  that  describes  deterministic  factors),  gk  is  the  stage  reward,  and  g n  is  the  terminal  reward. 
The  states  must  adhere  to  the  system  dynamics  xk+i  =  J-k(xk,  Vk,  dk),  and  the  policy  is  subject  to 
any  design  space  constraints  Hk{xk)  =  dk  E  T>k- 

We  focus  on  designing  experiments  to  infer  the  model  parameter  8  from  noisy  observations  yk  ■ 
To  achieve  this,  we  adopt  a  Bayesian  perspective,  and  choose  the  belief  state  to  be  the  posterior 
Xk,b  =  0\do,  yo, . . . ,  dk- 1,  Vk-i,  system  dynamics  to  be  Bayes’  theorem,  and  terminal  reward  to  be 
the  Kullback-Leibler  (KL)  divergence  from  the  final  posterior  to  the  initial  prior — an  information¬ 
measuring  criterion. 
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Equation  [l]  is  difficult  to  solve  directly,  but  can  be  expressed  in  an  equivalent  form  using  the 
principle  of  dynamic  programming  (DP),  that  is  easier  to  tackle: 

Jk(xk )  =  max  Eyk \Xk4k  [ gk(xk ,  yk,dk)  +  Jk+i  Hk(xk,  yk,  dk))}  (2) 

dkevk 

Jn(xn)  =  9n{xn).  (3) 

The  Jfc(xfc)  functions  are  called  value  functions,  and  the  optimal  policy  is  described  by  the  argument 
that  maximizes  the  right  hand  side  of  these  equations. 


2.2.2  Approximate  dynamic  programming 

Equations  [2]  and  [3]  must  be  solved  approximately  and  numerically  using  approximate  dynamic  pro¬ 
gramming  (ADP)  techniques.  We  take  an  approach  using  the  “one-step  lookahead”  representation, 
whose  underlying  idea  is  to  construct  functions  Jk  that  approximate  Jk,  for  all  k.  Once  these  ap¬ 
proximate  value  functions  Jk  are  constructed,  the  approximate  optimal  policy  can  then  be  extracted 
via  one  step  of  lookahead: 


Vk(xk)  =  argmaxE^^ 
dkevk 


9k(%k'>  Vki  dk)  <4+1  (J~k  i^k'i  Uk ?  4;)) 


(4) 


for  k  =  0, . . . ,  N  —  1,  and  with  J+x/v)  =  9n(xn). 

We  choose  to  numerically  represent  Jk  using  a  simple  and  intuitive  parametric  linear  architec¬ 
ture,  and  construct  them  through  the  backward  induction  procedure: 


Jk(xk)  =  rkcj)k{xk) 


If  max  Ey  i,£.  ^  9k(xk,  Vkt  dk)  T  Jk-\-\  (d~k  (xk:  yki  dk)) 
[dk£Vk  yfc|  L 


n  Jk  (3:^)  • 


(5) 


Here  rk  is  a  vector  of  scalar  coefficients,  and  cj)k  are  basis  functions  or  features.  The  induction 
procedure  starts  at  the  end  with  J+xat)  =  9n(x n)  and  then  proceeds  backwards  from  k  = 
N  —  1  to  k  =  1.  A  suitable  selection  for  the  approximation  operator  n  is  linear  regression,  which 
offers  flexibility  for  generating  regression  points  from  a  combination  of  exploration  and  exploitation 
strategies.  We  further  developed  an  iterative  method  to  improve  the  exploitation  strategy  as  we  gain 
a  better  understanding  of  the  characteristics  of  good  policies  from  samples  generated  throughout 
the  procedure. 


2.2.3  Transport  maps 

While  ADP  addresses  the  optimality  aspect  of  sOED,  another  major  difficulty  remains:  to  numer¬ 
ically  represent  non-Gaussian,  continuous  random  variable  posteriors  (i.e.,  belief  states)  that  arise 
naturally  from  inference  for  nonlinear  models.  In  particular,  we  need  to  use  a  representation  that 
allows  Bayesian  inference  to  be  performed  many  (millions  of)  times  in  a  feasible  manner  under 
different  candidate  designs,  observations,  and  priors  within  the  ADP  procedure.  A  suitable  choice 

is  the  transport  map,  that  is  a  function  T  that  transforms  a  random  variable  z  to  another  random 

id  id  I  I 

variable  £  such  that  £  =  T(z),  where  =  denotes  equality  in  distribution.  For  example,  Figure  UJ 

illustrates  a  log-normal  random  variable  z  mapped  to  a  standard  Gaussian  random  variable  £  via 
£  =  T{z)  =  Hz). 
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Figure  4:  A  log-normal  random  variable  z  can  be  mapped  to  a  standard  Gaussian  random  variable  £  via 
£  ='  T(z)  =  Hz). 


A  special  form  of  multivariate  transport  map  that  has  a  triangular  structure — the  Knothe- 
Rossenblatt  (KR)  map — is  particularly  useful  for  performing  Bayesian  inference  repeatedly.  For 
example  in  one  experiment,  with  d  being  the  design  variable,  y  the  observations,  and  6  the  parameter 
to  be  inferred,  we  can  construct  a  KR  map  on  the  joint  distribution  of  (d,  y ,  6): 


'  £1 ' 

'  Td(d) 

£2 

= 

y) 

_  £3  . 

_  Te\ytd(d,y,0)  _ 

where  £i,£2,  £3  are  i.i.d.  standard  Gaussians.  Because  of  the  triangular  structure  of  variable  depen¬ 
dence,  Bayesian  inference  with  a  particular  design  d*  and  observations  y*  simply  involves  condition¬ 
ing  (i.e.,  substituting)  these  values  into  the  last  row  of  the  joint  map,  to  arrive  the  corresponding 
posterior  map.  As  a  result,  this  inference  via  conditioning  process  can  be  repeated  for  different 
designs  and  observations  at  an  extremely  low  computational  cost.  This  concept  is  extended  to 
multiple  experiments,  leading  to  a  higher-dimensional  joint  map  that  allows  Bayesian  inference  to 
be  carried  out  easily  for  any  number  of  experiments.  The  joint  map  is  also  easy  to  construct,  as 
they  involve  solving  a  convex  optimization  problem  that  can  be  easily  separated  into  independent 
sub-problems  for  each  dimension,  and  requires  only  samples  from  the  target  distribution  which 
is  available  through  the  aforementioned  exploration  and  exploitation.  The  use  of  transport  maps 
plays  a  crucial  role  in  making  the  overall  sOED  method  tractable. 

2.2.4  Results 

The  sOED  method  is  successfully  demonstrated  on  realistic  applications  of  optimal  sensor  place¬ 
ment.  In  the  scenario  of  a  chemical/biological  contaminant  spill,  we  design  a  sequence  of  locations 
for  measuring  contaminant  concentrations  for  the  purpose  of  inferring  the  contaminant  source 
location.  This  sequential  design  problem  requires  a  balance  of  planning  ahead  for  future  wind 
conditions,  attaining  high  measurement  signals,  and  reducing  vehicle  movement  costs,  which  is 
captured  withing  the  method  developed. 

In  one  example,  we  design  four  experiments  in  one-dimensional  physical  space.  Figure [5] displays 
distributions  of  total  rewards  from  1000  simulated  trajectories  as  a  function  of  iteration  for  refining 
the  exploitation  measure,  and  the  expected  utility  (mean)  values  are  connected  by  the  dashed  blue 
line.  A  clear  advantage  of  iteration  is  observed  as  the  expected  utility  increases  sharply  after  the 
first  stage.  Indeed,  a  good  policy  is  achieved  after  the  second  iteration,  and  the  expected  utility 


7 

DISTRIBUTION  A:  Distribution  approved  for  public  release. 


values  of  sOED  are  much  higher  than  that  achieved  from  an  exploration  policy  (—2.0).  Additionally, 
we  have  also  demonstrated  the  advantages  of  sOED  over  batch  OED  and  greedy  design  approaches 
(results  not  shown  here),  further  supporting  the  near-optimality  of  our  results. 


Figure  5:  ID  contaminant  source  inversion  problem:  total  reward  distributions  from  1000  simulated  tra¬ 
jectories  over  increasing  iterations  of  exploitation  update  £.  The  blue  dashed  line  connects  the  mean  of  the 
distributions. 

In  a  more  challenging  setting,  designing  three  experiments  in  two-dimensional  physical  space,  the 
histograms  for  designs  do,  d\,  and  c?2  from  1000  simulated  trajectories  are  shown  in  Figure [6j  Each 
dk  has  two  components,  corresponding  to  the  two  physical  space  dimensions.  The  movement  trend 
of  the  sensor  corresponds  to  the  expectation  of  future  wind  conditions  that  starts  blowing  to  the 
north  and  northeast.  The  pairwise  and  marginal  distributions  from  samples  used  to  construct  the 
joint  map,  and  samples  generated  from  that  map  are  shown  in  Figure  [7J  The  distributions  exhibit 
extremely  non-Gaussian,  heavy-tail,  and  even  multi-modal  behavior.  Nonetheless,  the  map  is  still 
able  to  capture  these  characteristics  well,  with  the  map-generated  distributions  (right)  matching 
well  with  their  counterparts  that  were  used  to  construct  the  map  (left). 


Figure  6:  2D  contaminant  source  inversion  problem:  dk  histograms  from  1000  simulated  trajectories. 


3  Personnel  Supported 


Two  graduate  research  assistants  (partial):  X  Huan,  C  Feng.  Faculty  summer  support:  Y  Marzouk. 
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Figure  7:  2D  contaminant  source  inversion  problem:  pairwise  and  marginal  distributions  from  samples 
used  to  construct  the  exploration  map  (left),  and  from  samples  generated  from  the  resulting  map  (right). 
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